# LLM Evaluation Integration Guide

This guide provides a concise overview of integrating LLM evaluation using DeepEval. DeepEval is an open-source framework for evaluating LLM applications, similar to Pytest but specialized for LLM unit testing. It incorporates metrics such as hallucination, answer relevancy, RAGAS, and more, and supports various LLM applications, whether implemented via RAG or fine-tuning, LangChain or LlamaIndex.

## Key Features

DeepEval offers a variety of LLM evaluation metrics, including G-Eval, Summarization, Answer Relevancy, Faithfulness, Contextual Recall, Contextual Precision, RAGAS, Hallucination, Toxicity, Bias, and more. It allows bulk evaluation of your entire dataset in under 20 lines of Python code, and supports the creation of custom metrics. DeepEval is also integrated with Confident AI for continuous evaluation throughout the lifetime of your LLM application.

## Integrations

DeepEval is fully integrated by default with deployed applications. Set the sampling rate directly in the `config.json`. Ability to set the specific metrics is coming soon!

## QuickStart

To get started with DeepEval, install it using pip, create an account for logging test results, and write your first test case. Set your OPENAI_API_KEY as an environment variable and run your test case in the CLI. For more detailed instructions, refer to the [DeepEval Documentation](https://deepeval.com/docs).

## Summary

DeepEval provides a comprehensive solution for evaluating LLM applications, offering a wide range of metrics, support for custom metrics, and integration with other tools. By using DeepEval, developers can effectively evaluate their LLM applications and make informed decisions to improve their performance.